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Amendments to the Specification: 

Rewrite the paragraph at pa^e^, lines 8 to 10 as follows: 



This application claims priority under 35 USC §1 19(e)(1) of Provisional Application No. 
60/183,527, filed February 18, 2000 (TI 30302PS) and of Provisional Application No. 
60/181.654. filed February 18, 2000 (TI 26010PS) . 



, line'^ to page i^nc /asfc 



Rewrite the paragraph at page 6, line 16 to page 7, line 5 as follows: 



In microprocessor 1 there are shown a central processing xmit (CPU) 10, data memory 22, 
program memory 23, peripherals 60 and an external memory interface (EMIF) with a direct 
memory access (DMA) 61. CPU 10 further has an instruction fetch/decode unit lOa-c, a 
plurality of execution xmits, including an arithmetic and load/store unit Dl, a multiplier Ml, an 
ALU/shifter unit SI, an arithmetic logic unit ("ALU") LI, a shared multi-port register file 20a 
from which data are read and to which data are written. Instructions are fetched by fetch xmit 
10a from instruction memory 23 over a set of busses 41. Decoded instructions are provided from 
the instruction fetch/decode unit lOa-c to the ftmctional units Dl, Ml, SI, and LI over various 
sets of control lines which are not shown. Data are provided to/from the register file 20a from/to 
to load/store wiite unit Dl over a first set of busses 32a, to multiplier Ml over a second set of 
busses 34a, to ALU/shifter unit SI over a third set of busses 36a and to ALU LI over a fourth set 
of busses 38a. Data are provided to/from the memory 22 from/to the load/store units unit Dl via 
a fifth set of busses 40a. Note that the entire data path described above is duplicated with 
register file 20b and execution imits D2, M2, S2, and L2. Load/store unit D2 similarly interfaces 
with memory 22 via a set of busses 40b. In this embodiment of the present invention, two 
unrelated aligned double word (64 bits) load/store transfers can be made in parallel between CPU 
10 and data memory 22 on each clock cycle using bus set 40a and bus set 40b. 

Rewrite the paragraph at page^8, lines ^to X2as follows: 
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When microprocessor 1 is incorporated in a data processing system, additional memory 
or peripherals may be connected to microprocessor 1, as illustrated in Figure 1. For example, 
Random Access Memory (RAM) 70, a Read Only Memory (ROM) 71 and a Disk 72 are shown 
connected via an extemal bus 73. Bus 73 is connected to the Extemal Memory Interface (EMIF) 
- which is part of functional block 61 within microprocessor 1. A Direct Memory Access (DMA) 
controller is also included within block 61. The DMA controller part of functional block 61 
connects to data memory 22 via bus 43 and is generally used to move data between memory and 
peripherals v^thin microprocessor 1 and memory and peripherals which are extemal to 
microprocessor 1. 



Rewrite the paragraph at p^e 8, lines 16 to 22 as follows: 




A detailed description of various architectural features of the microprocessor of Figure 1 
is provided m coassigned application S.N. 09/012, 8 13 (TI 25311) U.S. Patent 6 J 82.203 and is 
incorporated herein by reference. A description of enhanced architectural features and an 
extended instruction set not described herein for CPU 10 is provided in coassigned U.S. Patent 
application S.N. (TI 3030 2) 09/703.096 Microprocessor with Improved 

Instruction Set Architecture and is incorporated herein by reference. 



Rewrite the paragraph at page 8, line 23 to page 9, line 2 as follows: 



Figure 2 is a block diagram of the execution units and register files of the microprocessor 
of Figure 1 and shows a more detailed view of the buses connecting the various functional 
blocks. In this figure, all data busses are 32 bits wide, unless otherwise noted. There are two 
general-purpose register files (A and B) in the processor's data paths. Each of these files contains 
32 32-bit registers (Aa-A31 for register file A 20a and B0-B31 for register file B 20b). The 
general-purpose registers can be used for data, data address pointers, or condition registers. Any 
number of reads of a given register can be performed in a given cycle. 



Rewrite the paragraph at page 11, line 3 to page 12, line 7 as follows: 
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Most data lines in the CPU support 32-bit operands, and some support long (40-bit) and 
double word (64-bit) operands. Each functional unit has its own 32-bit write port into a general- 
purpose register file (Refer to Figure 2). All units ending in 1 (for example, .LI) write to register 
file A 20a and all units ending in 2 write to register file B 20b. Each fiinctional unit has two 32- 
bit read ports for source operands srcl and src2. Four units (.LI, .L2, .SI, and .S2) have an extra 
8-bit-wide port for 40-bit long writes, as well as an 8-bit input for 40-bit long reads. Because 
each unit has its own 32-bit write port, when performing 32-bit operations all eight units can be 
used in parallel every cycle. Since each multiplier can return up to a 64-bit result, two write 
ports (dstl and dst2) are provided fi-om the multipliers to the respective register file. 



Rewrite the paragraph at page 12, lines 10 to 18 as follows: 



, lines 1 ftTl 



Each functional unit reads directly from and writes directly to the register file within its 
own data path. That is, the .LI unit 18a. .SI unit 16a. .Dl unit 12a. and .Ml tmits tmit 14a write 
to register file A 20a and the .L2 unit 18b, .S2 unit 16b, .D2 unit 12b, and .M2 «nits umt 14b 
write to register file B 20b. The register files are connected to the opposite-side register file's 
functional units via the IX and 2X cross paths. These cross paths allow functional units from 
one data path to access a 32-bit operand from the opposite side's register file. The IX cross path 
allows data path A's functional units to read their source from register file B. Similarly, the 2X 
cross path allows data path B's fimctional units to read their source fi-om register file A. 



Rewrite the paragraph at p^ge 12, lines 19 to 23 as follows: 



All eight of the functional units have access to the opposite side's register file via a cross 
path. The .Ml, .M2, .SI, .S2, .Dl^ and .D2 units' src2 inputs are selectable between the cross 
path and the same side register file. In the case of the .LI and .L2 both srcl and src2 inputs are 
also selectable between the cross path and the same-side register file. Cross path IX bus 210 
couples one input of multiplexer 211 for srcl input of XI unit 18a. multiplexer 212 for src2 
input of .LI unit 18a. multiplexer 213 for src2 input of .SI unit 16a and multiplexer 214 for src2 
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input of .Ml unit 14a. Multiplexers 21 U 212, 213. and 214 select between the cross path IX bus 
210 and an output of register file A 20a. Buffer 250 buffers cross path 2X output to similar 
multiplexers for .L2, .S2, ,M2, and .D2 units. 

Insert the following new paragraph at page 13, between lines 9 to 10: 



S2 unit 16b may write to control register file 102 fi-om its dst output via bus 220. S2 unit 
16b may read from control register file 102 to its src2 input via bus 221. 



age 13, Ime 24 to page 14, Ime 



Rewrite the paragraph at page 13, Ime 24 to page 14, Ime 4 as follows: 



Bus 40a has an address bus DAI which is driven by mux 200a. This allows an address 
generated by either load/store unit Dl or D2 to provide a memory address for loads or stores for 
register file 20a. Data Bus LDl loads data from an address in memory 22 specified by address 
bus DAI to a register in load unit Dl. Unit Dl may manipulate the data provided prior to storing 

(^1, 0 it in register file 20a. Likewise, data bus STl stores data from register file 20a to memory 22. 

Load/store unit Dl performs the following operations: 32-bit add, subtract, linear and circular 
address calculations. Load/store unit D2 operates similarly to unit Dl via bus 40b, with the 

assistance of mux 200b for selecting an address. 



Rewrite the paragraph at page 14, lines 15 to 23 as follows: 



Table 3 defines the mapping between instructions and functional units for a set of basic 
instructions included in a DSP described in U.S. Patent S.N. 09/012,813 (TI 25311, 6,182,203 
incorporated herein by reference). Table 4 defines a mapping between instructions and 
functional units for a set of extended instructions in an embodiment of the present invention. 
Alternative embodiments of the present invention may have different sets of instructions and 
functional unit mapping. Table 3 and Table 4 are illustrative and are not exhaustive or intended 
to limit various embodiments of the present invention. 
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(Xj^ The remaining steps 620, 630 and 64 64Q are identical to Figure 6A. 

Rewrite the paragraph at pag^8, lines^o 27 as follows: 
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/ / 

Rewrite the paragraph at page 19, lines 12 to 18 as follows: 



Performance can be inhibited by stalls from the memory system, stalls for cross path 
dependencies, or interrupts. The reasons for memory stalls are determined by the memory 
architecture. Cross path stalls are described in detail in U.S. Patent S.N. (TI 30563) 
09/702,453, to Steiss, et al and is incorporated herein by reference. To fully xmderstand how to 
optimize a program for speed, the sequence of program fetch, data store, and data load requests 
the program makes, and how they might stall the CPU should be understood. 



Rewrite the paragraph at page 27, line 25 as follows: 



The .M unit has three major functional units: Galois multiply unit 700a-c, multiply unit 
710 and other non-multiply functional circuitry in block 720. Galois multiplier 700a-c and 
multiplier 710 require three additional cycles to complete the multiply operations, so multiply 
instructions are categorized as having three delay slots. Pipeline registers 730-733 hold partial 
results between each pipeline execution phase. In general, multiply unit 710 can perform the 
following operations on a pair of multipliers 711a,b: two 16x16 multiplies or four 8x8 multiplies 
with all combination of signed or imsigned numbers, Q-shifting and P-shifting of multiply 
results, rounding for muhiply instructions, controlling the carry chain by breaking/joining the 
carry chain at 16-bit block boundaries, and saturation multiplication where the final result is 
shifted left by 1 or returns 0x7FFFFFFF if an overflow occurs. Galois multiply unit 700 
performs Galois multiply in parallel with M multiply unit 710. The lower 32 bits (bits 3 1 :0) of a 
result are selected by multiplexer 734 and are stored in the even register of a register pair. The 
upper 32 bits (bits 63:33) of the result are selected by multiplexer 735 and are stored in the odd 
register of the register pair. A more detailed description of configurable multiply circuitry is 
provided in co-assigned U.S. Patent application S.N. (TI 26010) 09/703,093 entitled 
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Data Processor With Flexible Multiply Unit and is incorporated herein by reference. Details of 
the Galois multiply unit are provided in co-assigned U.S. Patent application S.N. (TI 
26013) 09/507.187 to David Hoyle entitled Galois Field Multiply and is incorporated herein by 
reference. 



